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A.  DIRECTOR  S  OVERVIEW 

This  document  is  an  annual  report  of  the  current  two  year  Cornell  Joint 
Services  Electronics  Program  for  the  period  from  May  1,  1988  to  April  30,  1989. 
This  is  the  first  year  after  the  program  was  broadened  from  an  exclusive  focus 
on  compound  semiconductor  materials  and  devices  into  a  two  theme  approach. 
One  of  these  themes  continued  the  compound  semiconductor  research  concen¬ 
trating  on  more  fundamental  phenomena,  femtosecond  transport  and  optical 
phenomena  in  heterostructures,  while  the  other  theme  added  research  in  a 
new  area,  real  time  digital  signal  processing,  to  the  program.  The  new  objec¬ 
tives  brought  four  new  faculty  (C.  Pollock,  G.  Bilardi,  F.  Luk  and  H.  Tomg)  to  a 
total  of  seven  principal  investigators  now  participating  in  the  program. 

As  a  consequence  of  these  major  changes  in  the  program  new  research 
objectives  have  been  pursued,  common  ground  between  the  new  set  of  investi¬ 
gators  has  been  charted,  and  collaborative  research  initiated.  As  a  means  to 
define  the  identity  of  the  new  Cornell  JSEP  program  the  director  initiated  the 
publication  of  an  entire  issue  of  the  Engineering  Quarterly,  a  periodic  journal 
published  by  the  College  of  Engineering,  on  JSEP  research  [A  New  Thrust  in 
Electronics  Research,  Engineering:  Cornell  Quarterly,  Vol.  23,  No.  1,  Autumn 
1988].  This  Quarterly  issue  was  distributed  to  the  full  JSEP  distribution  in  addi¬ 
tion  to  the  wide  Cornell  distribution  to  industries,  businesses,  firms,  govern¬ 
ment  organizations,  foundations,  alumni,  faculty,  and  students. 

Efforts  to  establish  a  new  compound  semiconductor  growth  facility  at 
Cornell  have  been  the  main  collaborative  issue  for  the  JSEP  faculty  during  the 
current  program  period.  The  new  facility  is  expected  to  require  continued 

attention  well  into  future  years  of  the  program.  The  old  organometallic  vapor 
phase  epitaxy  (OMVPE)  operation  on  the  fourth  floor  of  Phillips  Hall  had 
become  substandard  because  of  ever  tightening  hazard  gas  safety  regulations. 

The  new  growth  laboratory,  a  shared  facility  operated  under  the  technical 
direction  of  Prof.  R.  Shealy  and  an  oversight  committee  consisting  of  users  and 
administration,  is  being  established  in  an  existing  building  off-campus.  Plans 
for  the  new  facility  call  for  the  installation  of  three  independent  OMVPE 
systems  to  be  used  for  specialized  growth  tasks.  One  of  these  reactors  will  be 
the  rebuilt  reactor  moved  from  Phillips  Hall,  the  second  a  fully  operational 
reactor  donated  by  General  Electric  Company,  and  the  third  a  reactor  cur¬ 

rently  under  construction.  Much  of  the  effort  in  R.  Shealy's  task  has  been 
devoted  to  the  establishment  of  this  new  OMPVPE  facility,  which  will  provide 
truly  unique  capabilities  for  compound  semiconductor  heterostructure 
growth.  Compound  semiconductor  JSEP  faculty  in  addition  to  R.  Shealy  have 

contributed  to  the  planning,  fund  raising,  and  extensive  discussions  with 
university  administration  because  of  the  importance  of  this  facility  to  JSEP 
and  related  research. 

In  addition  to  joint  efforts  centered  around  the  new  OMPVE  facility, 
research  interactions  within  the  compound  semiconductor  theme  have 
intensified.  As  an  example,  the  ongoing  tunable  femtosecond  optical  relaxa¬ 
tion  experiments  have  been  designed  jointly  by  experimental  and  theoretical 
investigators  (C.  Pollock  and  P.  Krusius)  in  order  to  maximize  the  results  from 
the  complex  measurements.  Synergism  between  the  four  research  groups  of 
R.  Shealy,  C.  Tang,  C.  Pollock,  and  P.  Krusius  on  other  efforts  in  materials 
growth,  device  processing,  femtosecond  measurements,  and  theoretical  analy¬ 
sis  has  been  growing  as  well.  This  evolution  will  be  even  more  strongly 
reflected  in  the  results  anticipated  for  the  second  year  of  the  current  program 
period. 
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The  three  task  investigators  in  the  present  program  contributing  to  the 
real  time  signal  processing  theme,  C.  Bilardi,  F.  Luk  and  H.  Tomg,  have  held 
regular  meetings,  identified  overlapping  research  areas,  and  started  to  define 
unifying  research  issues.  Significant  results  have  already  been  accomplished 
by  this  group  despite  the  fact  that  they  started  in  the  Cornell  JSEP  program  in 
May  1988. 


B.  DESCRIPTION  OF  SPECIAL  ACCOMPLISHMENTS  AND  TECHNOLOGY  TRANSITION 
B.l  Femtosecond  Carrier  Processes  in  Compound  Semiconductors 

Significant  accomplishments  both  in  facilities  and  research  have  been 
achieved.  While  the  new  off-campus  multi-reactor  OMVPE  facility  is  not 
expected  to  become  operational  until  sometime  during  the  second  year, 
considerable  progress  has  been  made.  The  planning,  fund  acquisition,  reactor 
construction,  and  the  operational  management  structure  have  for  the  most 
part  been  completed,  and  interior  construction  is  under  way.  Once  completed 
this  facility  is  likely  to  provide  one  of  the  most  modern  and  versatile  OMVPE 
growth  capabilities  for  a  variety  of  compound  semiconductor  materials.  Tt  can 
also  serve  as  an  example  OMVPE  facility  designed  for  ultra  safe  handling  of 
toxic  hydride  source  gases. 

Three  new  unique  femtosecond  laser  sources  have  been  developed.  The 
first  one,  a  high  repetition  rate  UV  femtosecond  source  is  based  on  intracavity 
frequency  doubling  in  a  BaB204  crystal.  The  second  broadly  tunable  red  to  mid 
IR  femtosecond  laser  source  employs  resonant  parametric  oscillation  and  a 
KTi0P04-  The  third  source  is  a  high  power  color  center  laser  tunable  in  the  0.7 
to  0.85  eV  photon  energy  range.  In  the  coming  years  these  femtosecond  laser 
sources  will  be  used  to  study  the  dynamics  of  carrier  processes  in  a  variety  of 
compound  semiconductor  materials,  heterostructures,  and  devices.  In  parallel 
the  ensemble  Monte  Carlo  transport  simulation  approach  has  been  extended  to 
describe  femtosecond  optical  interactions  and  dual  carriers.  It  is  expected  that 
significant  advances  in  the  insight  and  understanding  of  femtosecond  carrier 
processes  will  accrue  from  the  joint  design  of  femtosecond  laser  experiments 
and  their  subsequent  detailed  microscopic  analysis  using  this  dual  carrier 
simulation  capability. 

Four  Ph.D.  degrees  and  one  Master  of  Science  degree  have  been  awarded 
to  graduate  students  working  on  JSEP  research  under  this  theme.  Two  of  the 
JSEP  investigators  have  spent  their  sabbatical  leaves  at  research  laboratories 
on  problems  related  to  JSEP  research.  C.  Pollock  worked  for  six  months  at  the 
NRL  in  Washington,  D.C.  on  new  color  center  lasers.  P.  Krusius  was  for  the 
entire  academic  year  1988/89  at  IBM  Research  at  Yorktown  Heights  working 
on  hot  carrier  transport  in  the  strain  layer  GexSij.x/Si  materials  system. 

Largely  building  on  years  of  leading  research  in  optoelectronics  under 
JSEP  support,  C.  Tang  served  as  the  coordinator  and  principal  investigator  to  a 
research  proposal  to  DARPA  as  a  response  to  the  broad  agency  announcement 
BAA#89-09.  In  this  document  the  establishment  of  the  National  Optoelectronic 
Materials  Center  with  East  and  West  Coast  Divisions  was  proposed.  Research 
groups  primarily  from  Cornell  University  and  the  University  of  California 
Santa  Barbara  were  involved. 


B.2  Real  Time  Signal  Processing 


All  task  investigators  in  this  theme  started  JSEP  research  just  a  year  ago. 
Despite  this  significant  accomplishments  have  already  been  achieved.  The 
Naval  Ocean  Systems  Center  (San  Diego,  CA)  is  building  a  linear  algebra 
parallel  processor  based  on  the  work  of  F.  Luk.  The  Boston  based  company. 
Computational  Engineering,  is  considering  building  a  systolic  array  processor, 
also  based  on  F.  Luk's  work,  for  real  time  analysis  of  airplane  wing  flutter  for 
the  Air  Force.  A  patent  for  the  dispatch  stack,  a  new  method  for  speeding  up 
RISC  processors,  was  issued.  H.  Tomg  is  continuing  his  work  on  the  dispatch 
stack  for  real  time  computing  systems  with  multiple  functional  units. 

One  Ph.D.  degree  and  one  Master  of  Engineering  degree  were  awarded  to 
graduate  students  working  on  JSEP  research  under  this  theme. 

One  of  the  JSEP  investigators,  H.  Tomg,  is  organizing  the  first  meeting  of 
"Project  2000”,  an  interactive  partnership  between  academia  and  industrial 
researchers  working  on  high  speed  computers  for  the  future.  The  first  tech¬ 
nical  meeting  is  to  be  held  on  the  Cornell  Campus  June  22-23,  1989. 
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C.  DESCRIPTION  OF  INDIVIDUAL  WORK  UNITS 
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TASK  PRINCIPAL  INVESTIGATOR:  J.  Richard  Shealy 

(607)  255-4657 


OBJECTIVE 

The  materials  task  for  the  JSEP  program  has  several  objectives.  The 
first  and  major  goal  is  to  extend  the  crystal  growth  technology  which  has  been 
developed  in  this  program  to  allow  new  semiconductor  structures  to  be  pre¬ 
pared  for  use  in  high  speed  electron  devices.  This  will  require,  in  most  cases, 
the  pioneering  of  new  epitaxial  structures  often  using  novel  modifications  of 
the  OMVPE  process.  The  second  objective  is  to  prepare  more  standard  mate¬ 
rials,  such  as  lattice  matched  and  pseudomorphic  systems  on  InP  and  GaAs  sub¬ 
strates,  for  other  characterization  and  device  fabrication  studies  in  this  task  as 
well  as  the  tasks  of  Professors  Tang,  Pollock  and  Krusius.  The  final  objective  is 
to  develop  an  optical  probing  technique  to  characterize  the  properties  of  bulk 
and  2  dimensional  electron  systems. 


DISCUSSION  OF  STATE-OF-THE-ART 

Recently,  the  optical  absorption  spectrum  of  the  commonly  used 
organometallic  species  for  AlGaAs  growth  has  been  determined  [1].  The 
absorption  spectra  of  TMG,  TMA,  and  ASH3  are  such  that  little  or  no  absorption 
will  occur  at  wavelengths  longer  than  220  nm.  As  a  result,  a  potentially  very 
important  innovation  in  OMVPE  growth  of  III-V  alloys  is  the  incorporation  of 
a  deep  UV  laser  excitation  during  growth  to  allow  selective  growth  with  a  high 
degree  of  spatial  resolution.  This  previous  study  utilized  a  broad  band  ArF 

excimer  laser  operating  at  193  nm  with  40  mJ/cm^  of  pulsed  energy  density. 
The  pulse  repetition  rate  was  varied  to  control  the  growth  rate.  It  was  found 

that  at  substrate  temperatures  in  excess  of  500°C,  good  quality  GaAs  films  could 
be  grown  in  a  selective  fashion.  The  growth  rate  could  be  doubled  by  the 
absorption  of  this  modest  laser  power  in  the  gas  phase.  With  the  deep  UV 

source,  the  excitation  directly  excites  absorbed  surface  species  (TMG-ASH3 
adducts,  for  example)  and  stimulates  growth  only  where  the  laser  light  resides. 
If  the  diffusion  of  the  stimulated  surface  species  can  be  maintained  in  the  dark 
to  less  than  several  100  A,  a  condition  which  would  be  expected  at  low  substrate 
temperatures,  then  interference  holography  can  be  incorporated  in  the  UV 
stimulation  process.  In  contrast,  the  use  of  a  visible  excitation  source  [2] 
results  in  the  absorption  of  light  that  occurs  in  the  substrate  bulk.  The 
resultant  thermal  broadening  or  diffusion  of  injected  carriers  into  the  semi¬ 
conductor  due  to  the  laser  (either  process  has  been  proposed  to  explain 

selective  growth  behavior  with  an  argon  laser  source)  limits  the  achievable 
line  width  to  greater  than  several  pm. 

The  conventional  OMVPE  process  has  produced  many  of  the  III-V  mate¬ 
rials  and  structures  which  find  applications  in  high  speed  electron  devices. 
The  vast  majority  of  published  literature  involves  the  AlGaAs  materials  system. 


i 


6 


Newly  developed  reactor  geometries  have  improved  the  deposition  uniformity 
to  +  1%  with  good  quality  interfaces  as  demonstrated  by  a  high  mobility  modu¬ 
lation  doped  heterostructure  [3].  The  use  of  the  multi-chamber  reaction  cell 
[4]  has  demonstrated  an  AlGaAs/GaAs  interface  abruptness  which  approaches, 
and  in  the  case  of  high  temperature  growth,  surpasses  that  of  MBE.  This  con¬ 
clusion  is  based  on  interpretation  of  Raman  spectra  of  confined  optic  phonon 

vibrations  present  in  short  period  superlattices  [5].  This  method  is  subject  to 

less  interpretation  than  other  commonly  used  techniques  such  as  quantum 
well  PL  or  TEM  lattice  images. 

Wide  bandgap  III-V  alloys,  mainly  the  AlGalnP/GaAs  system,  have  been 

studied  and  improved  by  the  use  of  ethyl  organometallics  [6].  The  first  obser¬ 
vation  of  crystal  ordering  was  recently  made  in  the  quaternary,  AlGalnP, 
where  the  ordering  is  similar  to  that  observed  in  GalnP  alloys  [7].  However, 
the  A1  and  Ga  atoms  are  arranged  randomly  (lacking  order)  on  a  (111)  plane 

followed  by  a  (111)  plane  containing  predominantly  In.  The  ethyl  sources 
used  for  the  growth  of  AlGalnP  have  been  shown  to  be  advantageous  for 
improving  the  impurity  doping  efficiency  of  these  widegap  materials.  Finally, 
the  first  report  of  the  successful  use  of  GalnP  layers  as  the  electron  con¬ 
finement  layer  in  modulation  doped  FETs  has  been  reported  [8].  It  was  shown 
that  larger  2DEG  sheet  concentrations  can  be  achieved  with  the  GalnP/GaAs 
interface  for  the  same  mobility  as  that  of  the  AlGaAs/GaAs  case.  This  supports 
the  idea  that  the  GalnP  alloy  without  the  presence  of  deep  donor  species  is  a 
better  electron  supply  and  confinement  layer  to  GaAs. 

There  have  been  many  studies  recently  reported  in  the  OMVPE  growth 
of  materials  and  device  structures  which  relates  to  the  proposed  research,  in 
fact,  too  many  to  discuss  in  this  document.  Some  of  the  main  technical 
advances  have  been  highlighted.  Additional  reference  material  is  collected  in 
the  most  recent  OMVPE  conference  proceedings  (Hakone,  Japan)  which 
appears  in  93  volume,  nos.  1-4  of  the  Journal  of  Crystal  Growth. 


PROGRESS 

In  the  past  1  year  period  progress  has  been  made  on  several  aspects  of 
Cornell's  JSEP  materials  program.  The  program  has  involved  growth  and 

characterization  of  materials,  device  feasibility  studies,  upgrading  of  the  "JSEP 
OMVPE  reactor"  with  industrial  support,  and  finally,  the  design  and  con¬ 
struction  of  an  expanded  facility  to  support  the  needs  of  the  JSEP  program,  as 
well  as  other  programs  requiring  compound  semiconductor  materials. 

Novel  Heterostructures  and  Insulators  on  InP 

One  objective  of  this  research  is  to  exploit  a  recently  discovered  tech¬ 
nique  that  increases  the  Schottky  barrier  of  InP  from  0.5  V  to  over  0.8  V  [9].  If 
the  Schottky  barrier  is  stable,  high  performance  MESFET's  can  be  realized 
which  take  advantage  of  the  high  velocity  and  breakdown  field  strength  of  M 

InP.  It  was  found  that  an  ozone  exposed  InP  surface  will  produce  approxi¬ 

mately  10  A  of  native  oxide  which  enhances  the  Schottky  barrier.  Almost  ideal 
IV  curves  exhibiting  very  low  leakage  current  have  been  observed  [10].  An 
ultraviolet  light  source  was  used  in  dry  air  at  room  temperature  followed  by  a 
post  anneal  at  300°C  in  this  process.  The  construction  of  a  oxidation  chamber 
with  UV  illumination  has  been  completed  to  extend  this  process  for  use  device  4 

applications. 
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While  at  an  elevated  temperature  and  in  a  nitrogen/oxygen  ambient, 
the  UV  light  interacting  with  the  oxygen  produces  ozone  that  interacts  with 
the  InP  surface.  Preliminary  experiments  with  InP  samples  were  performed 
in  N2,  O2,  and  H2  ambients  at  500°C.  When  InP  was  annealed  in  the  O2  atmo¬ 
sphere,  thin  films  were  detected.  The  oxide  thickness  for  InP  annealed  in  the 
dark  and  under  UV  illumination  was  200  and  350  A,  respectively.  This  data 
clearly  shows  that  the  modest  UV  exposure  (-  5  mW/cm^)  will  enhance  the 
thermal  oxidation  process.  At  lower  temperatures  the  ozone  produced  oxida¬ 
tion  dominates  the  thermally  activated  process.  Recently  we  have  obtained 
rectifing  Schottky  barriers  on  ozone  treated  InP  surfaces. 

OMVPE  Growth  Wide  Bandeao  III-V  Alloys 

Several  wide  bandgap  materials  systems  have  been  investigated  in  this 
program  for  their  potential  suitability  as  electron  confinement  layers  in  high 
speed  electron  device  structures.  These  materials  are  also  useful  for  high 
temperature  transistor  operation.  These  systems  studied  include  AlGalnP/GaAs 
and  AlGaP/GaP,  both  of  which  have  been  grown  by  OMVPE.  The  latter  of  these 
represents  the  widest  bandgap  alloy,  excluding  the  nitrides,  which  is  offered 
by  the  group  of  III-V  materials.  It  is  worth  noting  that  high  temperature 
operation  (550°C)  of  heterojunction  bipolar  transistors  have  been  reported 
with  AlGaP/GaP  structures  [11]. 

The  progress  to  date  on  these  materials  has  included  optimizing  the 
growth  conditions  for  high  quality  alloys  and  heterostructures.  The  optical 
properties  have  been  determined  with  Raman,  Eiectroreflectance,  and  Photo- 
luminescence  Spectroscopies.  This  study  has  produced  the  first  reliable 
energy  bandgap  data  on  epitaxial  AlGaP  and  AlGalnP.  Finally,  the  microscopic 
features  of  defects  present  in  AlGalnP  films  due  to  lattice  mismatch  have  been 
studied  with  Transmission  Electron  Microscopy. 

Raman  Spectroscopy  of  2  Dimensional  Electron  Gas  (2DEG)  Systems 

An  attempt  was  made  to  characterize  the  energy  subbands  in  2  dimen¬ 
sional  electron  systems  as  a  means  to  characterize  the  space  charge  transfer  at 
modulation  doped  heterojunctions.  First,  a  series  of  experiments  were  success¬ 
fully  completed  on  samples  containing  multiple  modulation  doped  interfaces. 
These  measurements  yielded  the  subband  energy  separation  in  the  quantum 
size  potential  troughs,  but  attempts  to  measure  Raman  Scattering  on  single 
interfaces  at  room  temperature  failed.  Unfortunately,  the  structure  that  is 
best  suited  to  high  speed  device  applications  is  the  single  modulation  doped 

heterojunction.  In  order  to  enhance  the  optical  interaction  of  the  Raman 
probe  and  the  2DEG,  the  feasibility  of  performing  Raman  (defining  the  selec¬ 

tion  rules)  in  a  waveguide  geometry  was  established. 

With  the  eventual  goal  of  examining  single  heterojunctions  containing 
a  2DEG,  inelastic  light  scattering  in  optical  waveguides  was  studied.  It  is  antici¬ 
pated  that  enhanced  interactions  with  2  dimensional  electron  gases  will  occur 
in  this  scattering  geometry.  The  usual  perturbation  theory  (2nd  order)  for 
light  scattering  in  bulk  crystals  [12]  has  been  extended  to  account  for  the 
added  complexity  of  guided  modes.  This  approach  accounts  for  the  observed 
differences  between  the  spectra  of  bulk  crystals  and  waveguides.  Currently, 
these  measurements  have  confirmed  the  theoretically  predicted  selection 
rules  and  work  is  underway  to  produce  epitaxial  structures  by  MBE  which  con¬ 
tain  a  single  2DEG  in  a  surface  waveguide  and  measure  the  electronic  Raman 
scattering  from  such  structures  at  room  temperature. 
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The  potential  impact  of  the  proposed  research  is  twofold.  First,  through  fl 

collaborations  with  task  (2-4),  the  discovery  of  new  semiconductor  structures 
for  improved  performance  of  high  speed  electron  devices  will  result.  This  will 
\r  Jude  the  use  of  wide  bandgap  electron  confinement  structures  and  novel 
structures  on  InP  to  take  advantage  of  its  intrinsic  electron  transport 
properties.  Furthermore,  with  the  successful  completion  of  submicron  selec¬ 
tive  OMVPE  growth  of  III-V  alloys,  the  first  practical  technology  for  producing  ■ 

quantum  wire  devices  will  emerge.  Secondly,  pioneering  a  new  scattering 
geometry  for  electronic  Raman  spectroscopy,  and  the  subsequent  examination 
of  2  dimensional  electron  systems,  will  allow  a  non-destructive  optical  probe  to 
overlap  electron  channels  in  devices  under  operating  conditions.  The  insight 
gained  from  such  measurements  will  likely  lead  to  new  epitaxial  structures 
and  device  geometries  for  improved  2  dimensional  electron  gas  transport  V 

properties.  j 
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OBJECTIVE 

The  basic  objective  of  this  task  is  to  study  the  dynamics  of  non¬ 
equilibrium  electrons  and  holes  in  compound  semiconductors  and  related 
structures  using  recently  developed  femtosecond  laser  sources  and  measure¬ 
ment  techniques.  There  have  been  several  major  breakthroughs  in  the 
development  of  new  femtosecond  laser  sources  in  the  last  year  or  so.  Femto¬ 
second  sources  in  the  uv  to  315  nm  and  in  the  ir  from  700  nm  continuously  to 
4.5  nm  have  been  developed  for  the  first  time.  This  wavelength  flexibility 
coupled  with  the  femtosecond  optical  correlation  spectroscopic  and  hot- 
luminescence  spectroscopic  techniques  previously  developed  in  our  labora¬ 
tory  will  allow  us  to  study  a  wide  range  of  processes,  materials,  and  structures. 


DISCUSSION  QF  STATE-OF-THE-ART 

Femtosecond  laser  sources  and  measurement  techniques  have  made  it 
possible  to  study  the  relaxation  dynamics  of  nonequilibrium  carriers  in  com¬ 
pound  semiconductors  directly  in  the  time  domain  for  the  first  time.  Until 
recently,  the  accessible  wavelength  range  was,  however,  limited  to  basically 
the  operating  range  of  the  Rh6G/DODCI  dye  laser,  or  approximately  630  nm 
Nonetheless,  a  great  deal  of  useful  information  on  the  relaxation  dynamics  of 
III-V  compounds  has  been  obtained  for  the  first  time  using  such  a  laser.  Much 
remains  to  be  done,  however.  To  make  further  progress,  the  accessible  wave¬ 
length  range  must  be  extended. 

The  usual  approach  to  extending  the  available  femtosecond  wavelength 
range  has  been  either  to  search  for  new  dye  combinations  or  through  femto¬ 
second  continuum  generation.  Despite  extensive  efforts  at  many  laboratories, 
few  dye  combinations  adequate  for  femtosecond  laser  applications  have  been 
found.  Dye  femtosecond  laser  sources,  most  of  them  not  nearly  as  good  as  the 
Rh6G/DODCI  laser,  are  available  only  in  a  few  very  narrow  wavelength  ranges 
in  the  visible.  In  the  case  of  continuum  generation,  because  of  the  need  to 
amplify  the  initial  femtosecond  laser  pulses,  the  repetition  rate  of  the  con¬ 
tinuum  generated  is  generally  more  than  five  orders  of  magnitude  down  from 
that  usually  available  in  Rh6G  dye  lasers  and  the  time  resolution  is  typically 
also  degraded  from  25  femtoseconds  down  to  several  hundred  femtoseconds. 

Recent  developments  in  Professor  Pollock's  laboratory  and  in  our 
(Tang’s)  laboratory  have  led  to  the  first  truly  broadly  tunable  femtosecond 
laser  sources  in  the  infrared  and  the  first  extension  of  the  femtosecond 
sources  into  the  uv.  Pollock's  source  is  based  on  color-center  lasers  and  is 
tunable  from  1.4  to  1.8  pm.  Our  source  is  based  upon  the  optical  parametric 
oscillator  in  the  ir  and  is  tunable  from  700  nm  to  4.5  pm  at  10®  Hz  rate,  but  the 
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power  level  is  in  the  mW  range  and  lower  than  the  color  center  lasers.  The  uv 
source  is  based  upon  the  intra-cavity  second  harmonic  generation  technique 
using  the  new  nonlinear  optical  crystal,  p -barium  borate,  grown  and  fabri¬ 
cated  in  our  laboratory.  This  led  to  nearly  100%  conversion  of  the  630  nm 
fundamental  light  to  315  nm  at  the  same  pulse  repetition  rate.  Thus,  a  uv 
femtosecond  source  with  comparable  characteristics  as  those  of  the  Rh6G  laser 
is  now  available  for  the  first  time.  Combined  with  other  dye  lasers,  this  tech¬ 
nique  should  extend  the  accessible  wavelength  range  in  the  uv  down  to 
approximately  240  nm.  Together  with  the  femtosecond  optical  parametric 
oscillators,  tunable  femtosecond  sources  are  now  available  for  studying  ultra¬ 
fast  processes  from  240  nm  to  4.5  pm.  This  vastly  extends  the  accessible  wave¬ 
length  range  for  studying  ultrafast  dynamic  processes. 

The  relaxation  dynamics  of  nonequilibrium  carriers  excited  by  2  eV 
photons  in  bulk  GaAs  have  been  studied  extensively  in  the  last  few  years.  It  is 
now  generally  agreed  that  the  shortest  relaxation  component  for  electrons  is 
30  to  40  fs,  as  first  reported  by  us  over  three  years  ago,  not  13  fs  as  reported  by 
the  group  at  MIT.  Our  result  was  confirmed  by  Becker,  et  al.  at  Bell  using  the  6 
fs  laser.  Because  the  6  fs  laser  is  at  low  repetition  rates  and  very  noisy,  the 
data  they  obtained  were  very  crude  and  not  able  to  resolve  the  individual 
relaxation  processes  as  well  as  could  be  done  with  our  high  repetition  rate 
femtosecond  laser.  Nevertheless,  they  have  been  able  to  conclude  that  the 
shortest  component  was  about  35  fs  as  we  reported.  On  the  whole,  the 
relaxation  dynamics  of  hot  electrons  in  bulk  GaAs  up  to  approximately  0.5  eV 
in  the  conduction  band  are  now  fairly  well  understood,  although  there  may 
still  be  some  unresolved  inconsistencies  between  the  conclusions  reached  on 
the  basis  of  the  time-resolved  experiments  and  a  recent  low  temperature  cw 
luminescence  experiment.  It  is  generally  accepted  that  polar  optical  phonon 
scattering  time  is  on  the  order  of  150  to  200  fs  and  the  r  to  L  deformation 
potential  is  0.8  -  0.9x10^  eV/cm  as  we  measured.  The  dynamics  of  electrons  in 
the  T  valley  of  GaAs  are,  therefore,  well  understood.  The  dynamics  of  the  holes 
are  still  far  from  understood,  however.  There  have  been  some  recent  studies 
addressing  this  issue,  although  the  picture  is  still  far  from  clear.  The  main 
difficulty  is  that  the  hole  relaxation  process  is  expected  to  be  even  faster.  With 
the  femtosecond  lasers  restricted  to  near  2  eV,  the  holes  generated  are  very 
near  the  top  of  the  valence  band  and  relax  quickly.  To  study  the  hole 
dynamics  optically,  the  holes  must  be  created  far  from  the  zone  center.  This 
means  femtosecond  pulses  at  shorter  wavelengths  than  630  nm  are  needed. 
With  the  new  uv  source  we  have  developed,  this  will  be  possible  for  the  first 
time.  This  is  one  area  that  we  plan  to  investigate  in  the  future. 

In  addition  to  bulk  GaAs,  the  relaxation  dynamics  in  other  important 
materials  such  as  GalnP,  GalnAsP,  etc.  can  all  now  be  studied  for  the  first  time 
with  the  new  ir  femtosecond  laser  sources  developed  at  Cornell.  In  fact,  IV-IV 
compounds  such  as  Si  can  also  be  studied  with  our  new  uv  source.  In  terms  of 
ultrafast  dynamics  in  these  materials,  very  little  is  known.  We  expect  to  look  at 
some  of  these  in  the  next  grant  period.  We  will  probably  begin  with  GalnP, 
since  samples  of  this  material  are  available  from  J.  R.  Sheaiy's  group. 

More  important  than  the  bulk  materials  are  structures  such  as  quantum 
wells,  superiattices,  and  tunneling  structures.  Although  some  preliminary 
results  have  been  obtained  on  simple  GaAs/AlGaAs  quantum  wells  and  super- 
lattices,  because  of  the  limitations  of  the  available  source  wavelengths,  these 
structures  have  hardly  been  explored  and  much  needs  to  be  done.  With  the 
new  ir  femtosecond  sources  down  to  4.5  pm,  we  will  have  an  opportunity  to 
study  for  the  first  time  optical  transitions  between  the  quantum  well  states  on 
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a  subpicosecond  time  scale.  Also,  the  effects  of  applied  electric  field  on  such 
structures  can  be  studied  with  the  help  of  such  sources. 

There  have  also  been  some  recent  studies  on  the  question  of  tunneling 
time  using  picosecond  lasers.  The  results  are  very  crude  and  nonquantitative.  i 

With  wavelength  tunability  and  less  than  100  fs  time  resolution,  we  should  be 
able  to  make  substantially  better  measurements  of  the  tunneling  time  in  dif¬ 
ferent  materials  and  structures  than  heretofore  possible.  This  is  another  area 
that  we  expect  to  be  able  to  make  a  unique  contribution  during  the  next  grant 
period.  Availability  of  suitable  tunneling  structures  is  important,  however. 


PROGRESS 

(1)  The  completion  of  our  study  on  the  relaxation  dynamics  of 
nonequilibrium  electrons  excited  by  2  eV  photons  in  GaAs  and  AlGaAs:  With 

the  time  resolution  available  from  our  visible  femtosecond  laser  system  and  4 

the  extremely  accurate  and  versatile  optical  correlation  spectroscopy  tech¬ 
nique,  a  vast  amount  of  data  on  the  dynamics  of  nonequilibrium  electrons 

photo-excited  by  2  eV  femtosecond  photons  have  been  collected  and  analyzed. 

A  consistent  and  quantitative  picture  of  the  fate  of  such  carriers  is  now  more 

or  less  complete.  Details  are  described  in  our  earlier  JSEP  publications.  Refs.  ; 

[1-6]  and  current  JSEP  publications  [4-6].  * 

(2)  The  development  of  the  first  high  repetition  rate  uv  femtosecond 
laser  source:  Intracavity  doubling  in  p-BaB2(>4  (BBO)  of  femtosecond  pulses 
into  the  ultraviolet  with  high  efficiency  is  demonstrated  (JSEP  publication 

[3]).  Pulse  widths  down  to  43  fs  at  10^  Hz  repetition  rate  and  outputs  as  high  as 

20  mW  per  arm  of  the  femtosecond  laser  on  a  continuous-wave  basis  have  been  q 

achieved.  The  ultra-violet  pulse  widths  were  determined  through  detailed 

cross-correlation  measurements  based  on  sum-frequency  mixing  to  210  nm  in 
ultra-thin  BBO  crystals. 

(3)  The  development  of  the  first  truly  broadly  tunable  femtosecond  laser 

source  from  the  deep  red  to  mid-ir  (JSEP  publication  [7]):  a  singly  resonant 

optical  parameters  oscillator  based  on  a  thin  crystal  of  KTiOP04  is  pumped  by  4 

intracavity  femtosecond  pulses  at  620  nm  from  a  visible  femtosecond  laser. 

Oscillation  results  in  stable  continuous  outputs  of  femtosecond  pulses  at  10&  Hz 

repetition  rate  and  milliwatt  average  power  levels  in  both  signal  and  idler 

beams.  Tuning  from  820  -  920  nm  and  1.90  -  2.54  pm  with  a  set  of  mirrors  has 

been  demonstrated.  With  multiple  sets  of  mirrors,  continuously  tunable  out-  ^ 

puts  from  -0.72  to  -4.5  pm  should  be  possible,  making  this  a  uniquely  versatile 

femtosecond  laser  source. 


POTENTIAL  SCIENTIFIC  IMPACT  OF  RESEARCH 

< 

A  clear  understanding  of  the  dynamics  of  highly  excited  non¬ 
equilibrium  carriers  in  semiconductors  are  of  basic  importance  to  solid-state 
physics  and  high-speed  electronic  and  optoelectronic  devices.  The  proposed 
program  is  aimed  at  providing  the  needed  information  through  optical  studies 
based  upon  femtosecond  lasers  and  measurement  techniques.  The  program  is 
expected  to  provide  not  only  basic  material  parameters  important  for  design-  4 

ing  and  understanding  the  behavior  of  high-speed  devices  but  also  new 
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femtosecond  laser  sources  and  techniques  that  might  be  used  for  a  wide  range 
of  material  and  device  studies. 


TTFCRFFS 
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OBJECTIVE 


The  temporal  relaxation  of  hot  carriers  in  narrow  bandgap  semi¬ 
conductors  has  been  studied  using  infrared  pulses  of  60-200  fs  duration.  The 
femtosecond  pulses  are  obtained  from  a  color  center  laser,  tunable  from  0.7  to 
0.85  eV.  This  wavelength  range  is  directly  useful  for  GalnAs  based  materials, 
and  has  been  frequency  doubled  to  provide  a  tunable  probe  in  the  1.4  -  1.7  eV 
range.  Tunability  with  femtosecond  resolution  provides  the  unique  ability  to 
measure  relaxation  rates  of  carriers  lying  anywhere  between  the  bottom  of 
the  conduction  band  up  to  approximately  the  L  and  X  valleys  in  certain  semi¬ 
conductors.  Measurement  of  both  energy  and  momentum  relaxation  are 
examined  in  bulk  InGaAs  as  a  function  of  probe  energy. 

The  femtosecond  pulses  will  also  be  used  as  an  optical  sampling  probe 
for  measurement  of  the  switching  dynamics  of  device  structures.  The  electro¬ 
optic  effect  of  III-V  compounds  will  be  used  to  sample  the  electrical  waveform 
in  a  device,  with  subpicosecond  resolution. 


DISCUSSION  OF  STATE-OF-THE-ART 

There  has  been  widespread  application  of  femtosecond  pulses  to  the 
study  of  ultrafast  phenomena  in  semiconductors.  The  techniques  used  for 
these  studies  (for  example,  pulse-probe  or  two  pulse  correlation)  are  well 
known,  and  are  the  techniques  we  will  use  in  this  proposed  work.  However 
most  femtosecond  spectroscopy  has  been  done  using  fixed  energy  2  eV 
photons.  Our  work  is  distinct  in  its  use  of  pulses  at  a  different  energy  (0.8  eV) 
and  which  are  tunable.  So  in  a  sense,  our  work  is  establishing  the  state-of- 
the-art  for  femtosecond  spectroscopy  in  narrow  bandgap  materials. 

There  has  been  extensive  work  in  the  measurement  of  the  scattering 
rates  of  carriers  in  GaAs,  AlGaAs,  and  quantum  well  structures  using  these 
materials.  Tang  (Cornell)  [1],  Ippen  (MIT)  [2],  and  Knox  (Bell  Labs)  [3]  have  all 
pioneered  techniques,  instrumentation,  and  measurements  on  these  samples. 
The  current  ability  of  optical  probing  is  well  established  for  determining  both 
the  rate  at  which  carriers  relax  from  the  initial  state,  and  the  mechanisms 
(such  as  intervalley  scattering)  responsible  for  the  short-lived  carrier  distri¬ 
butions  in  GaAs/AlGaAs.  There  has  not  been  as  much  work  on  direct  measure¬ 
ment  of  specific  devices  or  structures  with  the  aim  of  improving  the  mobility 
of  the  carriers  in  the  material;  most  work  to  date  has  primarily  focussed  on 
determining  the  dynamics  of  the  carrier  relaxation  within  a  given  structure. 

To  date,  because  of  the  lack  of  femtosecond  sources  in  the  0.8  eV  photon 
energy  range,  little  work  has  been  done  on  the  InGaAs  system.  Chemla  (Bell 
Labs)  [4]  has  recently  concluded  a  study  of  exciton  absorption  in  InGaAs  quan¬ 
tum  wells  using  2Q0  fs  resolution  pulses.  His  work  concentrated  only  on 


exciton  interactions  with  the  lattice  and  free  carriers,  and  confirms  earlier 
theory  about  the  scaling  of  binding  energy  with  bandgap  and  well  dimension. 
There  was  no  discussion  of  hot  carrier  dynamics. 

Using  magnetotransport  measurements,  Barlow  et  al.  (University  of 
Essex,  UK)  [S]  measured  the  rate  at  which  electrons  cool  down  to  the  lattice 
temperature,  and  found  picosecond  times  for  this  process.  No  temporal 

measurements  were  made  directly.  In  a  similar  study,  Kash  et  al.  (Bell  Labs)  [6] 
used  luminescence  to  measure  the  lifetime  of  hot  electrons  in  InGaAs  as  they 
also  cooled  to  the  lattice  temperature  over  a  10  ps  period.  To  date,  there  are  no 
published  reports  of  femtosecond  studies  of  the  initial  carrier  scattering  in 
InGaAs. 


PROGRESS 

We  have  been  working  for  one  year  under  JSEP  support.  To  date,  we 
have  developed  our  NaCl  laser  source  to  the  point  where  it  can  deliver  75  fs 
pulses  over  the  tuning  range  from  1.47  to  1.75  Jim,  which  is  an  ideal  range  for 
studying  InGaAs.  In  addition,  we  have  frequency  doubled  this  source  to  pro¬ 
vide  tunable  70  fs  pulses  in  the  750-850  nm  range  for  use  in  collaborative 
studies  with  other  JSEP  members.  We  stress  that  the  femtosecond  source  is 
tunable,  making  it  unique  not  only  by  the  fact  that  the  output  is  at  approxi¬ 
mately  0.8  eV,  but  by  the  fact  that  this  output  can  be  adjusted  over  approxi¬ 
mately  a  0.1  eV  range.  This  is  in  contrast  to  most  other  prior  femtosecond 
sources. 

The  direct  output  of  the  NaCl  femtosecond  laser  has  been  used  to  study 
the  temporal  relaxation  rates  of  hot  carriers  in  InGaAs  bulk  material.  Graduate 
students  Chris  Yakymyshyn  and  Brian  Zook  have  been  leading  our  effort  to 
study  GalnAs  samples  (provided  by  the  Cornell  crystal  growth  facilities). 
Using  the  two  pulse  correlation  method  first  developed  by  Prof.  Tang,  we  have 
probed  the  excited  carrier  lifetimes  in  GalnAs/InP  wafers  approximately  1  pm 
thick.  The  probe  pulses  have  been  tuned  from  1.675  pm  (which  corresponds  to 
photons  with  barely  enough  energy  to  promote  an  electron  across  the  band- 
gap)  to  1.53  |xm,  which  corresponds  to  photon  energy  that  is  about  70  meV 
above  the  band  gap  (70  meV  is  roughly  equal  to  the  energy  in  two  LO 
phonons).  Results  have  not  yet  been  fully  analyzed  or  published,  but  the  data 
shows  clear  trends.  Fig.  1  shows  the  raw  data  from  the  two-pulse  correlation 
experiments.  The  lower  trace  shows  the  nonlinear  transmission  of  a  sample 
excited  with  1.53  pm  photons  (sufficient  energy  to  place  the  carriers  about  70 
meV  above  the  conduction  band  minimum).  The  sample  transmission  recovers 
within  about  200  fs,  indicating  the  time  it  takes  for  excited  carriers  to  scatter 
out  of  their  initial  excited  states.  The  upper  trace  shows  the  transmission  of 
the  sample  when  pumped  by  photons  with  energy  near  the  bandgap  energy. 
The  relaxation  time  becomes  noticeably  longer,  displaying  long-lived  tails. 
This  response  is  consistent  with  a  model  of  carrier  scattering  due  to  LO 
phonons:  at  low  excitation  energy  there  is  not  enough  excess  energy  for  the 

carriers  to  relax  by  LO  phonon  emission,  so  they  cannot  rapidly  cool.  Our 
measurements  are  preliminary  at  this  point,  and  have  not  been  fully  analyzed. 
However,  student  Zook  has  developed  a  good  deconvolution  routine  for 
separating  the  optical  pulse  shape  from  the  observed  relaxation,  and 
Yakymyshyn  has  developed  the  laser  and  instrumen-tation  to  the  point  where 
our  signal-to-noise  ratio  in  the  wings  of  the  relaxation  signal  is  on  the  order 
of  60  dB.  The  data  we  are  obtaining  is  being  shared  with  Prof.  Krusius’  group. 
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FIGURE  I.  Normalized  optical  transmission  as  a  function  of  delay  between 
pump  and  probe  pulses.  Upper  and  lower  Figures  for  carrier 

densities  of  1.9  x  10^  cm‘3  and  1.3  x  10^®  cm*^. 


who  are  using  it  to  compare  with  numeric  simulations  of  the  carrier  dynamics 
for  this  material. 

There  is  the  possibility  that  some  of  the  relaxation  we  are  measuring  is 

due  to  surface  recombination,  so  Dr.  Bill  Schaff  is  growing  some  new  MBE 

wafers  which  sandwich  the  GalnAs  between  appropriate  transparent  mate¬ 
rials  to  eliminate  such  problems.  We  anticipate  that  by  the  end  of  the  second 
year  of  this  work,  we  will  have  essentially  characterized  the  relaxation 
dynamics  of  bulk  GalnAs  as  a  function  of  carrier  energy  near  the  conduction 
band  minimum,  and  as  a  function  of  carrier  concentration. 

The  frequency  doubled  femtosecond  pulses  have  not  yet  been  used  in 

our  study  of  GalnAs,  although  we  plan  to  use  them  in  the  next  three  year 
proposal  to  investigate  AlGaAs  structures  in  collaboration  with  Prof.  R.  Shealy 
and  Dr.  B.  Schaff.  To  date  we  have  used  the  frequency  doubled  source  in 

collaboration  with  Dr.  Paul  Tasker  of  Prof.  Eastman's  group  to  study  the 
response  speed  of  phototransistors,  and  with  Dr.  Bill  Grande  of  Prof.  Tang's 
group  to  study  the  switching  speed  of  his  novel  optical  switches.  In  the 
former  work,  we  used  direct  femtosecond  pulses  to  excite  a  GaAs  MODFET  photo- 
transistor,  we  were  able  to  directly  determine  the  switching  speed  of  the 
device  (turn-on  time  was  SO  ps,  tum-off  time  was  100  ps)  without  any  problem 
of  deconvolving  the  measured  response  with  comparably  long  electronic  or 
optical  pulses.  The  experiment  provided  clean  and  exact  information  about  the 
device.  The  work  with  Tang's  group  was  not  successful,  because  the  wave¬ 
length  of  the  pulses  we  could  generate  did  not  overlap  the  gain  bandwidth  of 
the  devices  under  test. 

There  has  also  been  a  great  deal  of  informal  collaboration  between  my 
group  and  Prof.  Tang's  group.  The  two-pulse  correlation  technique  we  are 
using  was  pioneered  by  the  Tang  group,  and  we  have  benefitted  from  his 
experience.  Also,  when  developing  the  second  harmonic  ability  for  our  APM 
laser,  we  discussed  many  aspects  of  the  problems  of  short  pulse  propagation  in 
doubling  crystals  with  his  students,  notably  Walt  Bosenberg. 


The  major  goal  of  this  research  is  to  gain  an  understanding  of  the 
carrier  relaxation  rate  in  InGaAs  and  InGaAs/InP  quantum  wells.  Since  this 
work  is  being  done  on  an  essentially  new  material  from  the  femtosecond  spec¬ 
troscopy  point-of-view,  we  expect  at  the  bare  minimum  that  an  improved 
understanding  of  the  physics  of  alloyed  III-V  semiconductors  will  be 
developed.  .  Hopefully,  these  measurements  will  lead  to  faster  electronic 
devices,  and  faster  optical  sources  and  modulators.  InGaAs  is  a  relatively  new 
material  which  is  known  to  have  a  very  high  mobility.  The  bandgap  of  InGaAs 
alloys  is  ideally  suited  for  present  optical  communication  systems.  Direct 
measurements  of  the  ultrafast  behavior  of  material  properties  and  devices 
based  on  this  material  should  have  a  strong  impact  on  the  engineering  and 
design  of  future  optoelectronic  and  electronic  devices. 
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OBJECTIVE 

The  objective  in  this  task  is  to  explore  the  physics  of  hot  carrier  trans¬ 
port  in  small  inhomogeneous  ultra  high  speed  compound  semiconductor 
heterostructures.  Interactions  with  thermodynamically  open  boundaries, 
graded  material  composition  with  imbedded  heterojunctions,  two-dimensional 
space  charge  phenomena,  and  steady  state  and  transient  conditions  are 
considered.  Specific  transport  issues  to  be  pursued  are:  ballistic  carrier  trans¬ 
port  across  heterojunctions,  two  carrier  transport  under  high  density  and 
recombination  conditions,  and  two  carrier  transport  under  optical  inter¬ 
actions  with  femtosecond  laser  probes. 
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Transport  and  optical  processes  of  carriers  in  compound  semiconductor 
materials  have  been  studied  intensively  for  the  past  two  decades.  Carrier 
processes  for  both  electrons  and  holes  in  the  non-interacting  quasi¬ 
equilibrium  limit  are  well  understood  within  the  framework  of  linear 
response,  and  extensive  work  on  hot  non-equilibrium  carriers  is  being 
performed.  Hot  carrier  research  has  primarily  been  focused  on  hot  electrons 

because  of  their  importance  for  high  speed  semiconductor  devices.  A  variety 
of  transport  and  optical  methods  have  been  used  to  probe  physical  processes 

influencing  hot  electron  behavior.  These  include  electron  interactions  with 
phonons,  impurities,  defects,  photons,  device  boundaries,  and  external  electro¬ 
magnetic  fields.  Transport  studies  in  bulk  materials  and  small  device  struc¬ 
tures,  and  in  particular  recent  picosecond  and  femtosecond  optical  probing 

techniques,  have  helped  to  quantify  the  physical  processes  determining  hot 
electron  behavior.  It  is  fair  to  state  that  the  understanding  of  hot  electron 
behavior  on  all  but  the  shortest  time  and  the  smallest  spatial  scales  is 
approaching  maturity. 

A  significant  part  of  the  hot  carrier  processes,  namely  those  involving 

both  electrons  and  holes,  and  their  interactions  under  non-equilibrium  condi¬ 
tions,  has  however  received  little  attention.  Recent  experimental  and 
theoretical  indicators  are  pointing  to  the  importance  of  electron-hole  inter¬ 
actions  and  hole  processes  in  the  behavior  of  hot  carriers.  A  few  examples 
illustrate  this  statement.  Optically  generated  hot  electron  and  hole  distribu¬ 
tions  have  been  shown  to  thermalize  on  drastically  different  time  scales  with 
the  distribution  function  itself  influencing  thermalization  dynamics  [1]. 
Transport  of  minority  carriers  in  dense  semiconductor  plasmas  has  been 
demonstrated  to  be  so  strongly  affected  by  the  electron-hole  interaction  that 
negative  minority  carrier  mobilities  have  been  measured  (electron-hole  drag) 
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[2].  Minority  carrier  velocity-field  characteristics  for  minority  electrons  in 
p-type  doped  Gao.47Ino.53 As  have  not  shown  evidence  of  the  transferred 
electron  effect,  which  reduces  the  average  drift  velocities  of  electrons  in  all 
III-V  compound  semiconductors  for  higher  electric  fields  [3].  In  a  recent 
theoretical  study  the  electron-hole  interaction  was  found  to  become  one  of  the 
primary  energy  loss  mechanisms  for  carrier  thermalization  in  GaAs  at  high 
carrier  concentrations  [4],  While  experiments  on  hot  electron  thermalization 
using  the  pump/probe  technique  developed  at  Cornell  by  Tang  [5]  have  in  the 
past  been  analyzed  neglecting  the  contribution  of  holes  [6],  such  approxima¬ 
tions  do  not  seem  to  be  justified  in  the  general  case.  Much  longer  relaxation 
times  were  recently  measured  by  an  IBM  group  in  an  energy  dependent  cw 
luminescence  study  [7].  These  latter  time  constants  appear  incompatible  with 
previous  pump  and  probe  measurements  and  their  subsequent  theoretical 
analysis  [5,6].  This  discrepancy  remains  unresolved. 

Dual  carrier  processes  also  determine  the  characteristics  of  a  number  of 
important  electronic  and  optoelectronic  devices,  such  as  heterostructure 
bipolar  transistors,  photodiodes,  phototransistors,  and  semiconductor  laser 
structures.  Heterostructure  bipolar  devices  have  shown  considerable  potential 
for  high  speed  gate  array  type  applications,  where  their  superior  current 
drive  capability  can  be  exploited.  However,  the  analysis  of  their  operation  is 
presently  limited  either  to  a  hydrodynamic  model  for  both  electrons  and  holes 
[8],  or  hot  carrier  particle  formulation  for  electrons  and  a  hydrodynamic 
model  for  holes  [9,10]  without  the  consideration  of  two-dimensional  phe¬ 
nomena.  Optoelectronic  devices  have  recently  become  increasingly  more 

important  because  of  their  applications  in  fiber  optic  communication,  mixed 
electronic  and  opto-electronic  systems,  and  potentially  also  in  pure  optical 
switching  systems.  While  the  recent  literature  on  these  devices  and  their 
applications  is  too  voluminous  to  be  quoted  here,  their  operation,  design,  and 
limitations  cannot  be  fully  understood  until  non-equilibrium  dual  carrier 
transport  and  optical  interactions  are  explored  in  inhomogeneous  device 
structures  on  a  femtosecond  time  scale. 


PROGRESS 

Research  during  the  first  year  of  this  two  year  program  has  focused  on 
two  non-equilibrium  carrier  problems:  (1)  carrier  transport  across  graded 

heterojunctions,  and  (2)  femtosecond  carrier  thermalization  processes  in 
1  narrow  band  gap  heterostructures.  A  third  research  issue  for  the  current  two 

year  period,  dual  carrier  transport  (3),  will  replace  problem  (1)  after  its  com¬ 
pletion  during  the  second  year. 

1-  Non-CQuilibrium  Carrier  Transport  Across  Graded  Heteroiunction 

Hot  electron  injection  across  graded  and  abrupt  III-V  compound  semi- 
<  conductor  heterostructures  has  been  explored  using  a  self-consistent  time- 

dependent  ensemble  Monte  Carlo  transport  formulation.  Electron  bands  are 
described  within  a  position  dependent  k.p  framework  in  combination  with  the 
virtual  crystal  approximation  to  account  for  pseudobinary  alloy  effects. 
Scattering  processes  include  intra  and  inter  valley  phonons  (optical  and 
acoustic),  ionized  impurities  (Ridley  screening),  and  the  alloy  effect  (chemical 
>  disorder,  Harrison  and  Hauser  formulation).  Scattering  rates  were  calculated 

within  the  k.p  theory  including  all  overlap  integrals.  This  transport  formula¬ 
tion,  described  in  detail  in  an  earlier  and  a  current  JSEP  publication  (see  Ref. 
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[11]  and  JSEP  publication  [1]),  has  during  the  current  period  been  imple¬ 
mented  in  a  two-dimensional  time-dependent  computer  code  for  non¬ 
equilibrium  electron  injection  studies.  This  code,  2D-TCMC,  allows  one  to  simu¬ 
late  the  behavior  of  non-equilibrium  electrons  in  a  rectangular  domain  com¬ 
prised  of  several  compound  semiconductor  regions  including  compositional 
grading  and  imbedded  abrupt  or  graded  heterojunctions.  Ohmic  and  Schottky 
contacts  can  be  placed  anywhere  on  the  periphery  of  the  rectangular  domain 
and  with  a  small  amount  of  rework  also  into  the  interior.  Ohmic  contacts  are 
described  via  an  interaction  with  external  thermal  reservoirs  using  correct 
injection  statistics.  Particle  conservation  is  not  explicitly  enforced.  Two- 
dimensional  space  charges  are  fully  included.  Charge  assignment  is  per¬ 
formed  using  the  cloud-in-cell  method  and  Poisson’s  equation  for  the  ensem¬ 
ble  is  solved  using  Hockney's  Fast  Fourier  Transform  based  technique. 

Hot  electron  transport  processes  across  laterally  uniform  one¬ 
dimensional  and  laterally  non-uniform  two-dimensional  graded  hetero¬ 
junctions  have  been  explored.  It  was  found  that  ensemble  phenomena  domi¬ 
nate  the  carrier  injection  process  across  the  heterojunction  influencing 
distribution  functions,  drift  velocities,  and  ballistic  carrier  fractions.  Space 
charges  and  current  continuity  play  a  crucial  role.  In  a  study  of  hot  electron 
injection  in  the  AlxGaj_xAs/GaAs  materials  system  across  one-dimensional 
heterojunctions  it  was  established  that  drift  velocities  downstream  from  the 
heterojunction  can  vary  by  a  factor  of  four  depending  on  the  state  of  the 
space  charge  at  the  junction.  The  injection  efficiency  can  be  influenced  via 

grading,  doping  density,  temperature,  and  applied  voltage  (JSEP  publication 
[2]).  Flat  band  conditions  desirable  for  high  current  drive  device  applications 

are  not  attainable  for  any  applied  voltages,  if  the  injection  junction  has  not 
been  correctly  designed,  an  important  finding  for  high  speed  device  design 
(JSEP  publications  [2,3]).  Lateral  electrodes  can  be  used  to  shape  the  space 
charge  at  the  heterojunction  in  two-dimensional  device  structures.  Lateral 

control  electrode  placement  can  influence  the  device  current  almost  by  an 
order  of  magnitude  for  a  similar  current  modulation  capability.  Two- 
dimensional  space  charge  phenomena  have  been  studied  in  vertical  FET  (VFET) 
devices  (JSEP  publication  [4]).  Both  steady  state  and  transient  ballistic  carrier 
transport  across  heterojunctions  has  been  shown  to  be  controlled  by  lateral 

space  charges.  Our  findings  also  explain  why  measured  cut-off  frequencies 
have  not  reached  expected  values  (JSEP  publications  [5,6]). 

Based  on  these  results  it  is  now  possible  to  understand  hot  electron 

injection  in  a  variety  of  compound  semiconductor  devices.  Optimization  issues 
for  a  class  of  heterostructure  unipolar  devices,  including  the  VFET,  the  per¬ 

meable  base  transistor  (PBT)  and  the  planar  doped  barrier  transistor  (PDBT) 
are  currently  being  simulated  on  the  IBM  3090-600E  supercomputer  in  order  to 
predict  their  ultimate  high  speed  potential  (JSEP  publications  [7,8]). 

2.  Femtosecond  Carrier  Thermalization 

Femtosecond  carrier  thermalization  is  being  explored  in  collaboration 
with  Pollock's  research  group  focusing  on  the  narrow  gap  GaxInj  ,xAs/InP 
heterostructure  system.  Femtosecond  pump/probe  experiments  with  the 
unique  tunable  color  center  laser  are  designed  jointly.  Pollock's  group  is 

performing  the  femtosecond  measurements,  while  theory  and  data  analysis  is 

performed  within  this  task.  This  collaboration  allows  experiment  and  theory 
to  interact  during  all  stages  of  the  research  and  thus  maximize  the  yield  of 
results.  Initial  experiments  on  InQ.53Gao.47As/InP  films  are  currently  in 
progress  (see  task  #3). 
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The  physics  of  hot  carrier  thermalization  in  these  thin  films  is 

described  with  a  dual  carrier  ensemble  Monte  Carlo  formulation  including 
both  electrons  and  holes.  Because  of  the  homogeneity  of  the  films  and  the 
slowness  of  ambipolar  diffusion  compared  to  the  thermalization  times,  it  is  not 
necessary,  to  include  spatial  inhomogeneities.  Electron  and  hole  bands  are 
again  calculated  within  the  k.p  formulation  with  corrections  for  higher 
bands  included  through  second  order  perturbation  theory.  Due  to  the  multi¬ 
plicity  and  warping  these  expression  are  quite  involved.  Both  acoustic  and 
optical  phonons  are  included  inelastically  through  the  long  range  (polar, 
piezoelectric)  and  short  range  (deformation  potential)  interactions.  Electron- 
electron,  hole-hole  and  electron-hole  scattering  are  included  with  the  simpli¬ 
fication  that  carriers  are  not  permitted  to  be  scattered  from  the  valence  bands 
to  the  conduction  bands,  or  vice  versa.  Scattering  between  valence  bands  is 

fully  included.  All  interactions  are  statically  screened.  For  this  pseudobinary 
materials  system  it  is  necessary  to  include  both  chemical  and  structural  dis¬ 
order  when  treating  alloy  scattering  because  o.f  local  anion  site  related  bond 
length  and  angle  distortions.  This  can  be  accomplished  using  the  molecular 
coherent  potential  approximation  [12].  Optical  interactions  are  included 

within  first  order  time  dependent  perturbation  theory  with  full  inclusion  of 
photon  polarization  effects.  Derivations  of  all  energy  dependent  scattering 
rates  and  polarization  dependent  optical  interactions  have  been  completed 
along  with  the  k.p  band  formulation.  Computer  code  is  being  developed  for 
the  simulation  of  the  tunable  pump/probe  experiment  including  photon 
polarization  effects.  Considerable  leverage  is  derived  from  the  2D-TCMC  code 

developed  for  non-equilibrium  electron  transport  studies  (see  section  1.).  It  is 
expected  that  this  code  development  and  the  analysis  of  the  first  measurement 
will  be  completed  prior  to  the  end  of  the  current  calendar  year. 

With  the  adopted  synergistic  experimental  and  theoretical  design  the 
energy  dependence  of  the  dual  carrier  thermalization  dynamics  can  be  studied 
without  a  condensation  of  the  data  into  a  set  of  effective  exponential  time 
constants,  a  considerable  advantage  compared  to  earlier  pump/probe  studies. 

3.  Dual. Carrier  Transport 

As  mentioned,  above  dual  carrier  transport  issues,  band  structure, 
carrier  dynamics  and  scattering  processes,  will  be  based  on  work  performed  in 
areas  (1)  and  (2).  In  addition  it  will  be  necessary  to  develop  a  formulation  for 
impact  ionization.  A  simple  threshold  field  dependent  model  derived  by 
Keldysh  will  be  used  as  the  starting  Space  charge  effects  and  carrier 

plasma  interactions  can  be  included  in  j  *._tural  fashion  because  of  the  self- 
consistency  and  adopted  microscopic  model  for  device  boundaries.  Work  in 
this  area  will  begin  during  the  second  year  of  the  current  program  after  the 
completion  of  the  carrier  injection  studies. 


uijmmcn  a  m  lactai  i  MCI  hi  a  «i]  m act  dm  : 


In  order  to  fully  understand  femtosecond  electronic  and  optical 
processes  in  compound  semiconductors  it  is  necessary  to  examine  non¬ 
equilibrium  electrons  and  holes  simultaneously.  The  dual  carrier  ensemble 
particle  methods  developed  in  this  task  will  allow  us  to  analyze  these  processes 
in  full  detail  in  realistic  inhomogeneous  heterostructures.  Although  these 
methods  will  be  extremely  complex,  and  take  considerable  amount  of  time  to 
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develop,  they  are  necessary  for  the  unambiguous  interpretation  of  the  femto¬ 
second  laser  measurement  of  carrier  processes  in  compound  semiconductor 
heterostructures.  Once  these  methods  have  been  developed,  correlations  with 
optical  and  transport  measurements  performed,  and  their  validity  established, 
they  can  be  applied  to  the  analysis,  design  and  optimization  of  a  large  number 
of  ultrafast  electronic  and  optoelectronic  devices. 
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OBJECTIVE 

The  goal  of  this  research  is  the  design  of  algorithm,  architecture,  and 
layout  of  special-purpose  VLSI  systems  for  very  fast  signal  processing.  The 
objective  is  to  obtain  circuits  that  make  optimal  use  of  silicon,  achieving  the 
maximum  data  rate  possible  for  a  given  amount  of  silicon  area.  It  is  important 
to  identify  the  factors  that  limit  the  performance  and  to  obtain  quantitative 
expressions  for  such  limitations. 


DISCUSSION  Of  STATE-.Qf.-IHE-ARI 

About  a  decade  ago,  a  VLSI  model  of  computation  was  proposed  [1,2]  to 
capture  the  essential  features  of  VLSI  as  a  computing  medium  and  to  allow  for 
mathematical  analysis  of  chip  design.  The  performance  of  a  VLSI  circuit  has 
generally  been  measured  in  terms  of  the  chip  area  A  and  the  computation  time 
T.  The  tradeoff  between  these  two  measures  has  been  investigated  for  many 
computational  problems  (see  [3]  for  some  examples).  In  the  process, 

considerable  knowledge  has  been  gained  on  algorithmic,  architectural,  and 
layout  issues  arising  in  the  design  of  VLSI  systems. 

In  the  Held  of  signal  processing,  the  area-time  tradeoff  of  basic  opera¬ 
tions  such  as  convolution  [4],  discrete  Fourier  transform  (see  [5,6])  has  been 
i  studied  extensively.  An  investigation  of  the  VLSI  complexity  of  digital 

filtering  was  initiated  in  [7],  although  much  remains  to  be  done  in  this 
direction. 

The  design  of  a  special-purpose  VLSI  system  typically  exploits  the 
properties  of  the  particular  operation  to  be  performed.  In  spite  of  many  case 

studies,  few  general  design  principles  have  emerged.  Progress  in  this  area 
i  would  be  very  desirable  since  it  could  simplify  the  design  process 

considerably. 

In  the  next  section,  we  shall  report  about  the  progress  made  in  the  past 
year  both  on  the  study  of  specific  signal  processing  operations,  and  on  the 
study  of  general  questions  which  relate  to  the  discipline  of  VLSI  design. 


PROGRESS 

We  summarize  below  some  of  the  main  findings  and  directions  of  our 
research.  A  more  detailed  account  can  be  found  in  the  references  listed  in  the 
section  JSEP  Publications. 
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Filtering  and  Prefix  Computations. 

One  of  the  main  targets  of  our  work  is  a  complete  characterization  of  the 
area/data-rate  tradeoff  of  digital  filtering.  An  early  study  [7]  indicated  that 

the  twisted-reflected-tree  (TRT)  is  very  efficient  for  the  execution  of  some 
high  data-rate  algorithms  for  digital  filtering.  The  TRT  is  also  the  architecture 
of  choice  for  a  general  class  of  problems  known  as  prefix  computations. 

The  desire  to  obtain  a  deeper  understanding  of  the  relationship  between 
filtering  and  prefix  computation  has  motivated  an  extensive  investigation  of 
the  latter.  Various  resource  tradeoffs  have  been  completely  characterized  in 
terms  of  algebraic  properties  of  the  semigroup  underlying  the  prefix  com¬ 
putation  (Ref.  [8],  JSEP  publication  [5])  These  results  have  independent 
interest  since  variants  of  the  prefix  operation  (e.g.,  fetch-and-add  on  the 
Ultracomputer,  scan  on  the  Connection  Machine,  and  muitiprefix  on  the 

Fluent  Machine)  play  an  important  role  in  parallel  computing. 

One  interesting  finding  has  been  that  the  TRT  is  not  an  optimal  network 
for  all  prefix  computations.  In  fact,  there  is  a  large  class  of  semigroups  whose 
prefix  problem  can  be  solved  on  a  more  compact  binary-tree  network.  An 
algebraic  characterization  of  this  class  has  been  developed. 

Recently,  we  have  been  able  to  establish  a  connection  between  filtering 
and  prefix  computation,  by  introducing  the  notion  of  universal  filter,  a  circuit 
with  two  types  of  input:  the  signal  to  be  processed  and  the  description  of  the 
filter  to  be  applied  to  that  signal.  Clearly,  a  specific  filter  can  always  be 
obtaiued  as  a  specialization  of  a  universal  one.  We  have  shown  that,  assuming 
infinite  precision,  a  universal  filter  can  be  viewed  as  performing  a  prefix 
computation  over  a  certain  semigroup.  We  are  currently  exploring  extensions 
of  these  results  to  finite-precision  computation. 

Multidimensional  Siena!  Processing. 

VLSI  signal  processing  can  be  extended  to  multidimensional  signals.  In 
JSEP  publication  [3]  we  have  taken  a  step  in  this  direction  for  the  multi¬ 
dimensional  discrete  Fourier  transform.  The  cases  of  complex  arithmetic  and 
modular  arithmetic  have  both  been  investigated.  Area-time  optimal  designs 
have  been  developed  for  a  wide  range  of  computation  times.  Previously  pub¬ 
lished  lower  bounds  on  the  area-time  performance  were  based  on  fallacious 
arguments,  and  completely  new  arguments  have  been  developed  to  establish 
performance  lower  bounds. 

The  results  published  in  JSEP  publication  [3],  in  common  with  almost  all 
the  studies  on  DFTs,  make  specific  assumptions  on  the  factorability  of  the  size 
of  the  transform.  In  recent  unpublished  work  we  have  succeeded  in 

constructing  optimal  circuits  in  the  general  case. 

Distributed  Implementation  of  Shared  Memory. 

The  design  of  a  special-purpose  VLSI  system  typically  involves  the 
choice  of  a  suitable  parallel  algorithm  and  architecture  for  the  desired  task. 
The  architecture  should  support  well  the  execution  of  that  algorithm,  and  have 
the  smallest  layout  compatible  with  this  requirement.  It  is  often  the  case  that 
parallel  algorithms  have  already  been  proposed  in  the  literature.  However, 
most  algorithms  are  developed  for  a  shared-memory  model  of  computation 
such  as  the  parallel  random  access  machine  (PRAM).  It  would  be  of  great 
interest  if  one  could  automatically  transform  a  PRAM  algorithm  into  a  VLSI 
algorithm. 

As  a  step  in  this  direction,  together  with  Kieran  Herley  (a  graduate 
student  supported  by  JSEP),  we  have  studied  the  problem  of  simulating  a  PRAM 


on  a  bounded-degree  network,  a  model  more  suited  to  VLSI  implementation. 
Optimal  simulation  schemes  have  been  obtained  in  JSEP  publication  [2]  for  the 
case  in  which  the  memory  size  grows  at  least  as  a  polynomial  function  in  the 
number  of  processing  elements.  Further  results  have  been  obtained  recently 
by  Herley  in  JSEP  publication  [6]  in  the  case  of  smaller  memory.  It  should  be 
mentioned  that  this  work  is  technically  difficult  and  that  further  progress 
depends  on  the  solution  of  some  deep  questions  in  graph  theory. 

Lower  Bounds. 

Communication  is  often  the  critical  factor  limiting  the  speed  of  parallel 
algorithms.  In  JSEP  publication  [4],  the  constraints  on  the  computation  time 
posed  by  propagation  of  information  are  analyzed  for  a  specific  class  of 
functions.  The  goal  is  to  gain  a  better  understanding  on  what  properties  of  a 
function  constrain  its  parallel  computation  time.  A  general  technique  is 
developed  to  obtain  lower  bound  on  the  parallel  computation  time  of  monotone 
boolean  functions  in  terms  of  the  length  of  their  largest  prime  implicant  or 
prime  clause. 


SCIENTIFIC  IMPACT.  .OF  ..RESEARCH 

The  work  on  prefix  computation  has  the  potential  for  important 
applications,  not  only  to  special-purpose  VLSI  structures,  but  also  to  general- 
purpose  parallel  computers.  Indeed,  already  a  number  of  parallel  program¬ 
ming  languages  provide  prefix  as  a  primitive  operation  (often  under  a 
different  name),  and  some  machines  support  the  operation  in  hardware. 

The  relationship  between  prefix  and  filtering  that  we  have  established 
is  likely  to  lead  to  a  new  perspective  on  the  filtering  problem.  We  are  already 
exploring  this  perspective  which  looks  very  promising. 

Fourier  transform  techniques  are  fundamental  in  signal  processing. 
Our  optimal  circuits  for  the  multidimensional  Fourier  transform  should  find 
applications  to  multidimensional  filtering,  among  others. 

The  problem  of  translating  shared-memory  algorithms  into  distributed- 
memory  algorithms  is  of  fundamental  importance.  Any  implementation  of  a 
large  memory  is  bound  by  technological  constraints  to  be  a  distributed  one. 
Thus,  the  work  reported  in  JSEP  publications  [2]  and  [6]  has  consequences  for 
VLSI  implementation  of  shared-memory  algorithms. 


DEGREES 

Kieran  Herley,  Ph.D.  expected  in  1989. 
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OBJECTIVE 

Systolic  arrays  have  emerged  as  a  preferred  means  for  performing  the 
many  matrix  computations  central  to  real-time  signal  processing  [12).  Indeed, 
the  Naval  Ocean  Systems  Center  in  San  Diego  is  building  a  machine,  the 
Systolic  Linear  Algebra  Parallel  Processor  (SLAPP)  [3],  based  on  our  theoreti¬ 
cal  design.  Undoubtably,  systolic  arrays  will  also  be  used  in  a  great  many 
other  environments.  What  we  need  to  do,  then,  is  provide  a  wide  range  of  cost 
effective  fault  tolerance  options  to  meet  the  varying  needs  of  all  users  of 
systolic  arrays.  What  are  these  needs?  At  the  minimum,  error  detection  is 
necessary  because  errors  in  matrix  computations  are  essentially  undetectable 
by  a  mere  examination  of  the  results.  For  the  real-time  environment,  we  need 
totally  reliable  arrays  that  do  not  reduce  the  throughput  of  the  original  array. 

Existing  fault  tolerance  methods  can  be  divided  into  three  categories: 
concurrent  error  detection  followed  by  reconfiguration,  error  masking,  and 
error  correcting  data  encoding.  Reconfiguration  schemes  are  extremely 
powerful  techniques  that  tolerate  any  pattern  of  errors,  permanent  or  transi¬ 
ent.  However,  the  performance  loss  caused  by  concurrent  error  detection, 
reconfiguration,  and  rollback  makes  reconfiguration  too  debilitating  for  the 
real-time  environment  while  the  hardware  redundancy  and  complexity  make 
it  too  costly  for  any  other.  Error  masking  schemes  are  superior  in  that  con¬ 
tinuous  system  operation  is  provided.  However,  the  apparently  necessary 
tripling  or  quadrupling  of  hardware  is  extremely  costly  and  error  masking 
schemes  are  vulnerable  to  certain  patterns  of  errors.  Encoded  data  is  exciting 
for  its  low  time  and  hardware  overheads,  but  its  error  detection  and  correction 
capabilities  for  multiple  errors  seem  limited. 

Which  technique  can  provide  the  choices  we  need?  We  advocate  the  use 
of  a  variety  of  methods,  each  with  its  own  strengths  and  weaknesses.  In 
particular,  we  like  algorithm-based  fault  tolerance,  virtual  redundancy,  and 
pair  and  spare.  These  techniques,  when  used  either  individually  or  in  com¬ 
bination,  will  provide  a  rich  source  of  cost-effective  fault  tolerant  systolic 
arrays. 

In  this  task,  we  are  primarily  concerned  with  hard  and  soft  processor 
calculation  errors.  We  ignore  transmission  and  memory  errors  because 
effective  error  correcting  codes  such  as  Hamming  codes  already  exist  to  deal 
with  these  errors.  We  propose  to  examine  critically  both  existing  and  new 
techniques  for  achieving  fault  tolerance,  first  in  systolic  arrays  and  then  in 
real-time  signal  processing  systems.  Further,  we  propose  to  formulate  and 
evaluate  schemes  for  fault-tolerance  in  such  arrays  and  systems,  including 
the  digital  filtering  structures  being  investigated  by  Bilardi  and  the  con¬ 
figurations  with  multiple  functional  units  being  studied  by  Tomg. 
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DISCUSSION  Of  STAIE-DEiTHE-ABT 

Algorithm-based  fault  tolerance,  proposed  by  Jacob  Abraham  and  his 
students  at  the  University  of  Illinois  [1,2, 4,5],  is  a  technique  specially  tailored 
for  systolic  algorithms  and  architectures.  By  encoding  the  input  data  as 
checksums  and  by  redesigning  algorithms  for  the  encoded  data,  one  can  detect 
and  correct  transient  errors  that  have  occurred  during  the  computations.  This 
approach  requires  a  very  low  overhead  and  uses  simple  arithmetic;  Abraham 
et  al.  apply  it  to  basic  operations  such  as  matrix-matrix  multiplication,  LU 
decomposition  and  matrix  inversion.  For  correcting  a  transient  error  in 
Gaussian  elimination,  they  propose  a  computation  rollback  to  the  point  just 
before  the  error  occurred.  However,  it  is  hard  to  execute  rollbacks  on  systolic 
arrays.  In  [8]  we  show  how  to  avoid  rollbacks  by  computing  the  correct 

decomposition  from  the  erroneous  one. 

In  [10]  we  extend  Abraham's  checksum  scheme  for  the  LU  decom¬ 
position  to  a  unified  scheme  for  three  different  triangularization  procedures: 
LU  decomposition,  Gaussian  elimination  with  pairwise  pivoting,  and  QR  decom¬ 
position.  We  show  how  to  represent  the  error  as  a  rank-one  perturbation  to 
the  data,  and  develop  a  new  error  model  where  the  occurrence  time  of  the 
error  is  not  involved. 

Although  in  exact  arithmetic  the  checksum  scheme  works  well  (an 
inconsistent  checksum  indicates  the  presence  of  a  transient  error),  in  floating 
point  arithmetic  an  undesirable  growth  of  rounding  errors  may  cause  con¬ 

fusion.  There  is  a  need  to  establish  a  tolerance  to  decide  if  an  inconsistent 
checksum  were  caused  by  a  (large)  transient  error  or  by  roundoffs.  In  [9]  we 
analyze  the  effects  of  rounding  errors  on  the  checksum  scheme  and  establish 

a  tolerance  for  transient  error  detection.  Furthermore,  we  show  that  the 

tolerance  is  necessarily  large  for  the  LU  decomposition  and  for  Gaussian 
elimination  with  pairwise  pivoting,  but  is  acceptably  small  for  the  QR  decom¬ 
position. 

The  guiding  principle  of  virtual  redundancy  is  the  same  as  that  of  any 
space  redundancy  technique:  make  the  same  calculation  on  different  proces¬ 
sors  and  compare  results  to  detect  and  correct  errors.  However,  instead  of 
replicating  the  hardware,  one  replicates  the  data  and  takes  advantage  of  idle 
processors  to  make  the  redundant  calculations  (cf.  Kim  and  Reddy  [6]). 

In  a  "pair  and  spare"  configuration,  there  are  two  pairs,  say  A  and  B,  of 
processors.  While  both  processors  of  pair  A  agree  and  both  processors  of  pair 
B  agree,  the  system  uses  the  results  of  pair  A.  If  either  pair  disagrees,  the 
results  of  the  agreeing  pair  are  used  while  a  signal  is  sent  to  maintenance 
alerting  them  of  a  critical  state.  We  define  a  critical  state  to  be  a  state  where 
one  more  error  may  cause  the  system  to  produce  faulty  results.  While  either 
pair  is  being  repaired,  the  other  pair  is  used  to  run  the  computer.  This  scheme 
is  used  by  Stratus  Computer  Corp.  in  their  non-stop  systems. 


PRQGJBJESS 

We  introduced  a  new  algorithm-based  fault  tolerance  technique  specifi¬ 
cally  designed  for  use  in  recursive  least  squares  minimization.  Through 
monitoring  a  single  scalar  x(n),  we  get  error  detection.  The  same  quantity  can 
also  be  used  as  an  indicator  for  correction.  This  technique  applies  in  equally 
effective  fashion  to  many  other  systolic  arrays,  as  it  provides  the  same  proper¬ 
ties  for  the  QR  decomposition  phase  of  the  algorithms.  Our  scheme  does  not 
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require  the  summation  process  during  the  decoding  stage;  it  requires  only  the 
monitoring  of  x(n).  Furthermore,  due  to  the  flow  of  data  into  the  array,  which 
is  in  a  wavefront  pattern,  x(n)  is  the  only  reasonable  quantity  to  examine 
continually.  The  chief  attribute  of  our  technique  is  its  remarkable  simplicity. 

Error  correction  has  proved  to  be  a  much  more  difficult  problem  to 
solve  than  error  detection  when  using  weighted  checksums.  We  provided  a 
theoretical  basis  for  the  correction  problem.  We  showed  that  for  a  distance  d+1 
weighted  checksum  scheme,  if  a  maximum  of  floor  (d/2)  errors  ensue  then  we 
can  determine  exactly  how  many  errors  have  occurred.  We  further  demon¬ 
strated  that  in  this  case  we  can  correct  the  errors  and  gave  a  procedure  for 
doing  so. 

To  avoid  numerical  overflows,  in  [4]  and  [5],  two  methods  were  proposed 
that  use  modular  arithmetic  to  compute  weighted  checksums.  A  new  scheme 
was  derived  by  us.  For  means  of  comparing  these  three  methods,  we  consider 
fixed  point  arithmetic  since  it  was  used  in  [4]  and  [5].  Let  r  denote  the  word 
length.  Huang  and  Abraham  [4]  suggested  that  we  compute  the  actual  check¬ 
sum  values  modulo  2r>  while  in  Jou  and  Abraham  [5],  a  similar  technique  was 
proposed  where  the  checksum  column  is  computed  modulo  2r  and  the  weighted 
checksum  column  is  computed  modulo  N,  where  N  is  the  largest  prime  less 
than  2r+ 1 .  They  proved  that  for  these  choices  of  weights,  the  schemes  detect 
errors  as  desired.  One  difference  between  the  methods  is  that  the  new  one  is 
all  pre-processing  while  the  other  two  add  extra  time  to  the  algorithm  as  the 
checksums  must  be  computed  using  modular  arithmetic.  Furthermore,  the 
prime  that  is  selected  by  Jou  and  Abraham  is  much  larger  than  the  prime  in 
our  work.  For  example,  in  [5],  if  the  word  length  r  =  32  then  the  prime  N  = 
8,589,934,583.  Our  scheme  is  relatively  independent  of  the  word  length;  it 
depends  upon  the  values  of  n  and  d.  For  n  =  1000,  a  large  value  in  light  of  the 
dimensions  required  by  current  signal  processing  problems,  and  d  =  50,  the 
prime  p  =  1051.  We  should  also  compare  the  sizes  of  the  weight  elements  for 
various  weight  generating  schemes.  Suppose  that  n  =  500  and  d  *  10.  Then  the 
largest  weight  generated  by  the  scheme  in  [5]  equals  244^1,  and  by  the  scheme 
in  [8],  equals  500^.  For  our  new  scheme,  the  smallest  prime  p  satisfying  p  >  n+d 
=  510  is  p  =  521.  Therefore,  every  weight  element  will  be  bounded  above  by  521. 
Thus,  we  see  that  for  this  moderately  sized  problem,  our  new  scheme  generates 
a  parity-check  matrix  whose  elements  are  likewise  of  moderate  size. 


SCIENTIFIC  IMPACT  OF  RESEARCH 

Systolic  arrays  have  been  proposed  and  constructed  for  various  applica¬ 
tions,  especially  in  the  signal  processing  area.  An  in-depth  and  systematic 
investigation  of  fault-tolerance  techniques  for  systolic  arrays  is  both  timely 
and  promising. 

New  VLSI  signal  processing  systems  built  upon  wafer  scale  integration 
call  for  new  fault  tolerance  techniques  that  will  recognize  the  unique  con¬ 
straints  and  opportunities  offered  by  this  emerging  technology.  The  task  of 
detecting  and  correcting  transient  errors  is  gaining  in  importance  as 
transistors  get  smaller  and  as  we  send  more  computing  systems  into  out  space 
(alpha  particles  are  known  to  change  bits). 

We  believe  that  the  investigations  being  proposed  will  yield  approaches, 
more  cost  effective  and  flexible  than  traditional  techniques  such  as  modular 
redundancy  and  quadded  logic,  and  may  initiate  new  areas  in  the  study  of  fault 
tolerance. 
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OBJECTIVE 

Our  task,  initiated  in  May  1988,  addresses  one  of  the  pressing  problems, 
if  not  THE  problem,  in  the  field  of  signal  processing:  how  to  perform  com¬ 
putations  fast  enough  to  meet  stringent  real-time  requirements. 

We  use  concurrent  processing  as  a  means  for  performance  improve¬ 
ments.  Concurrent  processing  should  complement  expected  continuing 
improvements  in  device  and  packaging  technologies  to  handle  the  imposed 
loads  by  real-time  signal  processing  and  other  time  critical  tasks,  and  to 
reduce  computation  latency. 

Concurrent  processing  can  be  classified  according  to  two  levels:  task 
and  instruction.  In  task-level  concurrent  processing,  a  given  job  is  parti¬ 
tioned  into  partially  ordered  subtasks,  and  these  subtasks  are  assigned  to 
cooperating  processors.  These  processors  communicate,  as  they  must,  through 
shared  memory  and/or  other  interconnection  mechanisms. 

In  instruction-level  concurrent  processing,  multiple  functional  units, 
installed  in  a  single  processor,  are  assigned  instructions  from  a  single  instruc¬ 
tion  stream.  In  such  a  structure,  several  instructions  are  being  processed  at 
the  same  time,  and  this  brings  about  performance  enhancement.  In  our  study, 
we  have  concentrated  on  systems  with  multiple  functional  units  (SMFU). 

CRAY  and  CDC  machines  achieve  their  high  performances  through 
exactly  such  configurations.  Recently  announced  RISC  processors:  Intel 
80960,  Motorola  88000  and  AMD  29000,  also  employ  multiple  functional  units. 
For  example.  Motorola  88000  processor  has  one  integer  unit,  one  floating-point 
unit,  and  6  optional  special-function  units. 

These  special  functional  units  may  very  well  be  the  SVD  processors, 
being  studied  by  Frank  Luk,  and  TRT  filters,  being  investigated  by  G.  Bilardi. 
They  may  serve  as  important  ingredients  in  signal  processing  computations. 

Our  task  investigates  the  design,  programming  and  implementation  of 
real-time  systems  through  instruction-level  concurrent  processing.  We  have 
been  working  on  problems  that  computer  architects  at  various  organizations 
have  just  begun  to  investigate. 


DISCUSSION  OF  STATE-OF-THE-ART 

Concurrent  processing  is  an  approach  to  meeting  the  "real-time  chal¬ 
lenge"  in  signal  processing;  it  takes  many  forms:  parallel  processors,  systems 
with  multiple  functional  units  (SMFU's),  and  various  combinations  of  the 
previous  two. 
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Researchers  have  been  investigating  parallel  processors  intensively.  A  well 
connected  set  of  processors  constitutes  an  inviting  scheme  to  reducing 
computation  latency.  Many  applications  have  been  identified  for  such 
structures;  problems,  however,  also  remain:  task  partition  and  scheduling,  task 
synchronization,  interprocessor  interconnection  and  communication,  system 
flexibility  with  respect  to  task  variations,  incorporation  of  specialized 
functional  units,  and  under-utilization  of  processors. 


A  block  diagram  of  a  system  with  multiple  functional  units  (SMFU)  is  given  in 
Figure  1.  Each  functional  unit  is  designed  to  perform  a  specific  set  of 
arithmetic  or  logical  operations  in  an  optimum  fashion.  High-speed  registers 
serve  as  a  buffer  between  the  memory  and  the  functional  units.  These 
registers  supply  operands  to  the  functional  units  and  receive  results  from 
them;  they  also  load  from  and  write  into  the  main  memory.  This  constitutes  a 
register-register  architecture,  with  the  main  memory  being  accessed  through 
load/store  operations. 

The  main  difference  between  a  "conventional  machine"  and  an  SMFU  is 
the  presence  of  multiple  and  "specialized"  functional  units  in  the  latter.  The 
instruction  unit  is  charged  with  the  task  of  issuing  these  functional  units  with 
sufficient  instructions  to  keep  them  busy.  At  a  given  machine  cycle,  multiple 
instructions  may  be  executed  concurrently;  this  is  one  of  the  main  reasons 
that  SMFU's  yield  high  throughput. 

The  interconnection  network  provides  paths  between  registers  and 
functional  units;  and  between  registers  and  the  main  memory.  The  network 
may  range  from  a  non-blocking,  fully  connected  crossbar  switch  to  a  set  of 
buses.  Its  selection  plays  a  critical  role  in  determining  the  performance  of  an 
SMFU  for  a  given  set  of  tasks. 

SMFU's,  such  as  CRAY  [1]  and  Floating  Point  Systems  [2]  machines,  are 
designed  for  general  purpose  applications.  Recently  announced  micro¬ 
processors  such  as  Motorola  88000,  Intel  80960,  and  AMD  29000  also  implement 
this  configuration;  furthermore,  these  processors  represent  the  new  wave  of 
Reduced  Instruction  Set  Computers  (RISC).  SMFU's  for  specific  military  appli¬ 
cations  have  been  explored  and  developed  at  many  organizations. 

SMFU’s  achieves  high  throughput  and  low  latency  without  incurring 
the  cost  of  performing  task  partition,  task  scheduling  and  task  synchroniza¬ 
tion;  the  system  reacts  easily  to  task  variations. 

A  very  important  point  in  the  design  of  SMFU's  for  real-time  signal 
processing  is:  When  a  specific  operation  is  frequently  performed,  an  optimally 

designed  hardware  structure  can  be  designed  and  incorporated  into  the  system 
as  a  functional  unit. 

For  example,  Luk  and  Bilardi  in  their  tasks  develop  efficient  structures 
for  matrix  and  filtering  operations;  these  structures  can  be  considered  as 
functional  units  in  a  real-time  signal  processing  SMFU. 

Two  problems  remain:  The  bandwidth  between  the  main  memory  and 

the  execution  complex  has  to  be  high  enough  so  that  an  adequate  supply  of 
instructions  and  data  is  maintained.  This  problem  is  addressed  by  among  other 

things  splitting  the  main  memory  into  an  instruction  memory  and  a  data 

memory. 
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FIGURE  1:  Multiple  Functional  Unit  Processor: 
General  Architecture 
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The  second  problem,  a  more  serious  one,  is  that  such  machines  gener¬ 
ally  do  not  make  the  best  use  of  their  functional  units  as  most  of  these  units 
stay  idle;  this  is  so  because  at  most  one  instruction  is  issued  per  machine  cycle. 
In  other  words,  these  precious  execution  resources  are  being  starved  because 
of  an  inadequate  supply  of  instructions. 

To  elevate  SMFU's  to  a  higher  level  of  performance,  as  we  must,  we  have 
to  address  the  instruction  issuance  issue.  We  do  that  in  this  task. 

Two  notable  schemes  have  been  implemented  to  alleviate  the  starvation 
problem:  Thornton's  scoreboard  [3]  and  Tomasulo's  reservation  stations  with 
Common  Data  Bus  [4].  In  both  cases,  instructions  are  still  issued  according  to 
the  order  they  appear  in  an  instruction  stream  —  an  instruction  will  not  be 
issued  until  all  instructions  which  precede  it  have  been  issued  already,  and  at 
most  one  instruction  is  issued  per  cycle. 

Dispatch  Stack 

To  remedy  this  situation,  we  formulated  a  "Dispatch  Stack"  (DS)  scheme 
[5],  which 

1.  issues  multiple  instructions  per  machine  cycle,  if  possible;  and 

2.  issues  instructions  out  of  sequence. 

According  to  the  DS  scheme,  two  or  more  instructions  may  be  issued  con¬ 
currently  as  long  as  there  are  no  data  dependencies  and  there  are  functional 
units  to  execute  them.  Multiple  instruction  issuances  per  machine  cycle 
increases  the  rate  of  instructions  dispatched  to  the  execution  complex  and  thus 
enhances  system  performance. 

Furthermore,  an  instruction  can  be  issued  to  an  available  functional 
unit  as  long  as  it  is  free  of  data  dependencies,  even  though  some  of  its  pre¬ 
ceding  instructions  are  still  awaiting  issuances;  the  issuance  of  instructions  is 
thus  non-sequential.  In  implementing  such  a  scheme,  the  instruction 
issuance  rate  is  enhanced  as  ready  instructions  can  be  issued  ahead  of  those 
which  precede  them. 

The  DS  is  conceptually  a  "device",  which  displays  and  checks  depen¬ 
dencies  among  instructions  in  an  instruction  stream;  it  can  be  viewed  as  a 
window  on  the  instruction  stream  with  certain  capabilities.  The  dispatch  stack 
can  be  either  realized  in  software  or  hardware  and  it  will  be  exploited  in  the 
programming,  design  and  implementation  of  SMFU's  for  real-time  signal 
processing. 

To  use  such  processors  for  real-time  and  transaction  oriented  applica¬ 
tions,  we  also  have  to  address  the  issue  of  how  to  handle  interrupts  promptly 
and  efficiently. 

Interrupts  can  be  classified  into  three  types:  external  interrupts,  excep¬ 
tion  traps,  and  software  traps.  External  interrupts  are  generated  from  or  by 
the  environment  —  such  as  the  processing  of  a  newly  arrived  task.  Abnor¬ 
malities  encountered  in  system  processing,  such  as  division  by  zero,  overflow, 
or  illegal  operations,  generate  exception  traps.  Software  traps  are  instructions 
which  initiate  interrupt  requests;  these  traps  provide  a  means  of  controlling 
certain  software  applications. 
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Interrupt  handling  mechanisms  are  evaluated  by  the  following  three  factors: 

Precise  processor  state:  When  an  interrupt  request  is  received,  the 
processor  must  have  the  capability  to  save  its  processor  state  precisely; 

Latency:  An  interrupt  handling  mechanism  should  be  judged  by  the 
latency  between  the  receipt  of  an  interrupt  request  and  the  completion  of 
saving  the  processor  state. 

Cost:  The  amount  of  hardware  and  software  costs  incurred  by  the 
installation  of  an  interrupt  handling  mechanism  must  be  taken  into 
account.  Furthermore,  we  have  to  identify  precisely  the  performance 
degradation  that  the  interrupt  handling  mechanism  may  have  inflicted  on 
the  system. 

The  CRAY  machines  [1]  have  multiple  functional  units  and  do  allow 
instructions  executed  out-of-order.  They  generally  allow  instructions  under 
execution  to  complete  before  the  processor  state  is  stored;  a  penalty  in  long 
latency  is  consequently  exacted.  In  the  IBM  360/91  [4],  a  precise  interrupt  is 
realized  by  allowing  all  issued  instructions  to  complete  their  execution;  this 
results  in  considerable  latency.  If  an  imprecise  interrupt  is  generated,  the 
processor  state  of  the  system  is  lost  and  the  system  cannot  be  restarted 

precisely  at  the  interrupted  point. 

Two  other  approaches  to  interrupt  handling  for  systems  with  multiple 
functional  units  have  recently  been  proposed.  In  installing  two  or  more 
additional  "checkpoints"  [6],  the  system  can  respond  to  an  interrupt  request  by 
"retreating”  to  one  of  these  checkpoints.  Clearly,  this  approach  proposed  will 
degrade  system  performance,  both  in  processor  speed,  and  in  the  time 
required  to  restore  to  a  consistent  processor  state  upon  receiving  an  interrupt 

request.  The  speed  of  the  system  will  be  slowed  down  by  the  movement  of  state 
information  as  the  states  change,  and  by  the  additional  read  instruction  which 
must  precede  all  instructions  which  alter  the  memory.  A  performance  penalty 
has  to  be  taken  to  correct  the  memory  to  a  consistent  state  when  an  interrupt 
request  is  received. 

Additional  shift  registers  can  also  be  installed  in  the  processor  to  make 
certain  that  results  are  loaded  into  registers  in  order  [7],  even  though  they 

may  be  produced  out  of  sequence.  This  approach  introduces  considerable 
degradation  to  system  performance  and  also  incurs  additional  hardware  costs. 

Branching  is  an  indispensable  ingredient  in  any  meaningful  program; 
it  however  injects  performance  damping  turbulences  into  the  instruction 
stream.  How  to  handle  conditional  branching  efficiently  remains  a  difficult 
challenge  for  computer  architects.  A  clear  survey  of  possible  techniques  in 
handling  conditional  branches  can  be  found  in  [8].  The  proposed  and 

implemented  systems  discussed  previously  do  not  approach  this  opportunity 
aggressively.  Pre-fetching,  small  and  tentative,  is  implemented  in  some. 
Checkpointing  can  again  be  applied  to  allow  instruction  execution  on  an 

assumed  path.  If  the  assumption  made  is  proven  incorrect,  a  consistent 
processor  state  can  be  restored  through  the  processor  state  corresponding  to 
the  checkpoints  implemented  [6].  In  most  cases,  the  supply  of  instructions  is 
usually  disrupted  by  the  presence  of  conditional  branch  instructions. 
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We  have  received  a  patent:  "Instruction  Issuing  Mechanism  for 
Processors  with  Multiple  Functional  Units"  ,  U.  S.  Patent  4,807,115,  February 
21,  1989.  This  patent  essentially  covers  the  Dispatch  Stack  that  we  have 
developed.  We  feel  that  we  have  been  addressing  the  instruction  issuing  ques¬ 
tion  that  processor  designers,  especially  those  concerned  with  RISC  proces¬ 
sors,  have  just  begun  to  explore;  this  is  confirmed  by  our  conversations  with 
several  companies. 

The  Dispatch  Stack  (DS)  that  we  have  formulated  performs  dynamic 
instruction  scheduling  at  run  time.  It  is  critical  that  the  requisite  data  depen¬ 
dency  checking  among  the  instructions  that  reside  in  the  DS,  the  fetching  of 
instructions  into  the  open  slots  in  the  DS,  and  the  identification  of  instructions 
that  can  be  issued  at  a  given  machine  cycle  must  be  carried  out  successfully 

without  elongating  the  machine  cycle  time,  which  can  be  in  the  range  of  10 
nanoseconds  for  certain  processors.  We  have  completed  a  study  on  the 
implementation  of  the  Dispatch  Stack.  And  have  formulated  ways  to  realize 
time-critical  circuits  so  that  all  functions  specified  for  the  Dispatch  Stack  be 
performed  even  in  an  extremely  short  machine  cycle. 

Multiple  and  out-of-order  instruction  issuances  for  a  modem  processor, 
as  implemented  with  the  proposed  DS,  brings  about  situations  when  several 
instructions  are  being  executed  concurrently  and  an  instruction  may  be  com¬ 
pleted  before  instructions  that  appear  before  it  in  the  instruction  stream. 
Precise  and  prompt  interrupt  handling  is  essential  if  these  high-performance 

processors  are  to  be  successfully  integrated  into  real-time  computing  systems. 
It  is  also  important  that  the  installation  of  interrupt  handling  mechanism  do 

not  degrade  system  performance  in  the  absence  of  interrupt  requests.  We 
have  formulated  a  preliminary  solution  to  this  problem.  The  solution  makes 
use  of  the  presence  of  the  DS  in  the  processor.  It  does  not  call  for  additional 
components;  it  responds  to  interrupt  requests  promptly;  and  it  imposes  no 
performance  penalty  when  no  interrupt  requests  are  present.  Much  work 

remains  to  be  done. 

We  have  initiated  investigations  on  the  handling  of  conditional 
branches.  Test  runs  have  verified  the  correctness  of  simulator. 

With  the  delivery  of  4  HP  370  Workstations  in  the  next  few  weeks,  we 
will  be  able  to  perform  extensive  simulations  to  evaluate  various  schemes  for 
interrupt  and  branch  handling.  Both  Livermore  Loops  and  Dhrystone 
benchmarks  wil.  be  used. 

On  another  front,  we  have  been  investigating  the  relationship  between 
processors  with  multiple  functional  units  (SMFU)  and  systolic  arrays,  being 
investigated  by  Frank  Luk  and  others.  We  are  implementing  some  of  the  basic 
and  important  matrix  operations,  which  have  been  efficiently  carried  out  with 
systolic  arrays,  on  SMFU's.  It  should  be  pointed  out  that  SMFU's  are  general 
purpose  systems,  while  systolic  arrays  are  designed  for  specific  operations. 


SCIENTIFIC  IMPACT  OF  RESEARCH 

Processors  with  multiple  functional  units,  such  as  the  CRAY  machines 
and  the  emerging  RISC  microprocessors,  represent  an  important  architecture 
that  can  be  exploited  for  the  implementation  of  real-time  signal  processing 


systems.  Our  task  addresses  a  vexing  problem  for  such  structures:  the  under¬ 
utilization  of  functional  units. 

It  is  expected  that  the  Dispatch  Stack  that  we  have  formulated  and  tested 
will  enable  such  processors  to  issue  possibly  more  than  one  instruction  for 
each  machine  cycle  and  instructions  can  be  issued  out-of-order.  These 
features  enhance  significantly  the  performances  of  such  processors  without 
raising  the  clock  rate.  And  we  believe  that  our  study  provides  timely  and 
much  needed  understanding  in  defining  the  processors  that  follow  the 
recently  announced  Motorola  88000,  Intel  80960  and  other  RISC  processors.  We 
have  received  several  inquiries  on  the  Dispatch  Stack. 

Furthermore,  our  continuing  work  on  interrupt  handling  will  remove 
yet  another  obstacle  and  make  such  processors  dominate  computer  structures 
for  real-time  signal  processing  and  other  applications. 
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